Muli-label Text Categorization with Hidden Components
نویسندگان
چکیده
Multi-label text categorization (MTC) is supervised learning, where a document may be assigned with multiple categories (labels) simultaneously. The labels in the MTC are correlated and the correlation results in some hidden components, which represent the ”share” variance of correlated labels. In this paper, we propose a method with hidden components for MTC. The proposed method employs PCA to capture the hidden components, and incorporates them into a joint learning framework to improve the performance. Experiments with real-world data sets and evaluation metrics validate the effectiveness of the proposed method.
منابع مشابه
A Comparison of Text Categorization Methods
In this paper firstly I have compared Single Label Text Categorization with Multi Label Text Categorization in detail then I have compared Document Pivoted Categorization with Category Pivoted Categorization in detail. For this purpose I have given the general definition of Text Categorization with its mathematical notation for the purpose of its frugality and cost effectiveness. Then with the ...
متن کاملExploiting Associations between Class Labels in Multi-label Classification
Multi-label classification has many applications in the text categorization, biology and medical diagnosis, in which multiple class labels can be assigned to each training instance simultaneously. As it is often the case that there are relationships between the labels, extracting the existing relationships between the labels and taking advantage of them during the training or prediction phases ...
متن کاملAutomated multi-label text categorization with VG-RAM weightless neural networks
In automated multi-label text categorization, an automatic categorization system should output a label set, whose size is unknown a priori, for each document under analysis. Many machine learning techniques have been used for building such automatic text categorization systems. In this paper, we examine virtual generalizing random access memory weightless neural networks (VG-RAM WNN), an effect...
متن کاملBoostexter: a System for Multiclass Multi-label Text Categorization
This work focuses on algorithms which learn from examples to perform multiclass text and speech categorization tasks. We rst show how to extend the standard notion of classiication by allowing each instance to be associated with multiple labels. We then discuss our approach for multiclass multi-label text categorization which is based on a new and improved family of boosting algorithms. We desc...
متن کاملError-Correcting Output Codes for Multi-Label Text Categorization
When a sample belongs to more than one label from a set of available classes, the classification problem (known as multi-label classification) turns to be more complicated. Text data, widely available nowadays in the world wide web, is an obvious instance example of such a task. This paper presents a new method for multi-label text categorization created by modifying the Error-Correcting Output...
متن کامل